Automated Labeling from Biomedical Journals published in Foreign Languages
نویسندگان
چکیده
An automated labeling (AL) module is developed to produce bibliographic records such as English title, vernacular title, author, affiliation, and English abstract from biomedical articles published in foreign language journals. Optical character recognition (OCR) output from scanned biomedical journals is used in this labeling process. Since frequently occurring words in a zone are important features, word lists are used as key features in the AL module. The AL module uses geometric and contextual features, and geometric relations between zones, as the basis for the rule-based labeling algorithms in the module. The algorithms uses 131 rules derived for foreign language journals. Experiments conducted with several medical journal articles show about 95% accuracy.
منابع مشابه
Automated Labeling Algorithms for Biomedical Document Images
The National Library of Medicine (NLM) has developed an automated system, named Medical Article Records System (MARS), to process bibliographic data (title, authors, affiliation, abstract, etc.) in biomedical journal articles for its MEDLINE database. This paper describes a labeling module in the MARS, which automatically extract the bibliographic data in biomedical journal articles. The label...
متن کاملAutomated Labeling Of Biomedical Online Journal Articles
An automated labeling (AL) module has been developed to automate the extraction of bibliographic data (e.g., article title, authors, affiliation, abstract, and others) from online biomedical journals for the National Library of Medicine’s MEDLINE database. The AL module employs string matching, statistics, and fuzzy rule-based algorithms to identify segmented zones in an article’s HTML pages a...
متن کاملApplication of the CONSORT statement to randomized controlled trials comparing endoscopic and open carpal tunnel release.
BACKGROUND The CONSORT (Consolidated Standards of Reporting Trials) statement was developed by a group of clinical trialists, biostatisticians, epidemiologists and biomedical editors as a means to improve the quality of reports of randomized controlled trials (RCTs). The purpose of the present study is to assess the reporting quality of published RCTs that compare endoscopic carpal tunnel relea...
متن کاملAutomated Labeling of Zones from Scanned Documents
The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), is developing an automated system, the Medical Article Record System (MARS), to identify and convert bibliographic information from printed biomedical journals to electronic format for inclusion in the MEDLINE database. This paper describes one aspect of ...
متن کاملAutomated labeling of bibliographic data extracted from biomedical online journals
A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, abstract, affiliation and others) from online biomedical journals to populate the National Library of Medicine’s MEDLINE® database. This paper describes a key module in this system: the labeling module that employs statistics and fuzzy rule-based algorithms to identify segmented ...
متن کامل